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Practical data analysis involves many implicit or explicit assumptions 
about the good behavior of the data, and excludes consideration of vari- 
ous potentially pathological or limit cases. In this work, we present a new 
general theory of data, and of data processing, to bypass some of these 
assumptions. The new framework presented is focused on integration, 
and has direct applicability to expectation, distance, correlation, and ag- 
gregation. In a case study, we seek to reveal faint structure in financial 
data. Our new foundation for data encoding and handling offers increased 
justification for our conclusions. 

Keywords: data coding, data encoding, data valuation, correspondence analy- 
sis, hierarchical clustering, geometric Brownian motion, financial modeling, time 
series prediction, data aggregation 

1 Introduction 

We develop a theory of data for contingency table data analysis, a priority area 
of application of correspondence analysis. Much of the foundations of data 
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theory that we discuss are quite general to data analysis, and independent of 
the correspondence analysis. Motivation includes the following. 

Correspondence analysis is carried out on a cloud of points (rows, columns) 
through finding of principal directions of elongation, etc. What legitimizes our 
assumption of a compact cloud of points? More generally, what legitimizes our 
data analysis of a given data set, when we assume that the data set is a sampling 
of facets or events (which are to be explained and interpreted through the data 
analysis)? Should we instead allow for singularities or other pathologies or 
irregularities in such a cloud of points? The data analyst, in a somewhat slipshod 
approach to analyzing data, ignores such issues, and instead cavalierly takes 
data as sometimes discrete and sometimes continuous. As an example of such 
singularities, consider the preprocessing of data using normalization through 
taking the logarithm (common in dealing with astronomical stellar magnitudes, 
or financial ratios). Such normalization can potentially give rise to undefined 
data values. Why do we consider that our input data sets do not also contain 
undefined data values? In all generality, what justifies the ruling out of such 
pathologies in our input data? 

The number of attributes used to characterize our observations is possibly 
infinite. Can our general foundations cope with this? A priori the answer is 
clearly no. In this article, we describe a foundation for data analysis, based on 
Henstock's approach to integration, which allows us to bypass such pitfalls in a 
rigorous manner. 

We need a theory which begins with empirical distribution functions deduced 
from empirical data (i) for which there is no analytical description, and (ii) that 
are amenable to empirical computation. 

We propose in this article a foundation for data analysis which is at the level 
of the data, rather than at higher levels of model fitting, so that we are fully 
compatible thereafter with all statistical modeling approaches. In passing we 
will note how quantitative and qualititive data coding are encompassed within 
our approach (in section [3]). Neither can be considered as the more legitimate. 
There is no one necessary a priori statistical model to be used because there 
is no one necessary a priori morphology for a data cloud. (See section [SJ) 
Nor is there any one necessary level of resolution in data encoding (section [5]) . 
Empirical distribution functions can be deduced from empirical data for which 
there is no analytical description; and then the Riemann sums, with their finite 
number of terms, are amenable to empirical computation. 

In multivariate data analysis, the input data set is assumed to be represen- 
tative and comprehensive. However the former cannot do justice to an unknown 
(and perhaps unknowable) underlying (physical, social, etc.) reality. The latter 
is approximated very crudely in practice. Can these goals of representativity 
and comprehensiveness even hypothetically be well approximated in practice? 
Only with the framework that we present in this article can pathologies be ex- 
cluded (in regard to representativity), and (in regard to comprehensiveness) can 
we be at ease with infinite dimensional spaces. 

As is clear from this list of motivations, we are concerned with the well- 
foundedness of numerical data, which will subsequently be subject to a statisti- 
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cal data analysis. The supposition that (multivariate, time series, etc.) data can 
be addressed as such has only been examined in terms of measurement theory 
(ordinal, interval, qualitative, quantitative, etc.) or levels of measurement by 
S.S. Stevens in the 1940s (see Velleman and Wilkinson, 1984). However sup- 
positions regarding input data have not been examined before in terms of the 
data set giving rise a well-behaved and exploitable processing input. We will do 
so in this article by showing how the Henstock or generalized Riemann theory 
of integration also provides a basis for asserting: a numerical data set can be 
analyzed. The focus on integration, and the perspective introduced, is easily ex- 
tended to expectation, scalar product, distance, correlation, data aggregation, 
and so on. 

A word on terminology used here: all statistical analysis of data starts with 
(qualitative or quantitative) data in numeric form, presupposing a valuation 
function mapping facets (or events) of the domain studied onto numerical values. 
We speak of this as data valuation, or more usually in this context as data 
encoding. The bigger picture of data encoding together with data normalization 
or other preprocessing, or indeed processing in the data analysis pipeline, is 
referred to in this article as data coding. 

2 Integration Background 

Probability theory, with foundations provided by Kolmogorov, is based on prob- 
ability measures on algebras of events and based ultimately on the Lebesgue 
integral. Lebesgue's just happened to be the first of a number of such investiga- 
tions into the nature of mathematical integration during the twentieth century. 

Subsequent developments in integration, by Perron, Denjoy, Henstock and 
Kurzwcil, have similar properties and were devised to overcome shortcomings 
in the Lebesgue theory. See Gordon (1994) for detailed comparison of modern 
theories of integration. However, theorists of probability and random variation 
have not yet really "noticed", or taken account of, these developments in the 
underlying concepts. There are many benefits to be reaped by bringing these 
fundamental new insights in integration or averaging to the study of random 
variation, and this article aims to demonstrate some of them in the context of 
data coding. 

It is possible to formulate a theory of random variation and probability, 
linked to data coding, on the basis of a conceptually simpler Riemann-type 
approach, and without reference to the more difficult theories of measure and 
Lebesgue integration. 

In particular it is possible to present a Riemann-type model of data encoding 
in which a valuation (potentially a data value) is a limit of Riemann sums formed 
by suitably partitioning the sample space in which the process x takes its 
values. See Muldowncy (1999, 2000/2001). 

To contrast (traditional) Legesgue and (more recent) Riemann integration, 
consider determining a mean value. Suppose the sample space is the set of real 
numbers, or a subset of them. If successive instances of the random variable 
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Interval Random variable Relative frequency 
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Table 1: For each j, the number sc^ is a representative element selected from 
or its closure. The resulting estimate of the mean value of the random 
variable f(x) is J2]=i f{x (J) )F(I {3) ). 
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Table 2: Here, x is again a representative member of a sample space Q which 
corresponds to the various potential occurrences or states in the "real world" 
in which measurements or observations are taking place on a variable whose 
values are unpredictable and which can only be estimated beforehand to within 
a degree of likelihood. A probability measure P is posited on a sigma-algebra 
of events A. 

are obtained, we might partition the resulting data into an appropriate number 
of classes; then select a representative value of the random variable from each 
class; multiply each of the representatives by the relative frequency of the class 
in which it occurs; and add up the products. The result is an estimate of the 
mean value of the random variable. Table [1] illustrates this procedure. The 
sample space is partitioned into intervals Jw) of the sample variable x, the 
random variable is f(x), and the relative frequency of the class 1^ is F(I^). 

The approach to random variation that we are concerned with in this article 
consists of a formalization of this relatively simple Riemann sum technique 
which puts at our disposal powerful results in analysis such as the Dominated 
Convergence Theorem. 

In contrast the Kolmogorov approach requires, as a preliminary, an excursion 
into abstract measurable subsets Aj of the sample space, fi (Table [2]). 

In practice, f2 is often identified with the real numbers or some proper subset 
of them; or with a Cartesian product, finite or infinite, of such sets. In Table 
[2l numbers y 3 are chosen in the range of values of the random variable f(x), 
and A 3 is f~ x ({y ] ~ x ,y 3 [)- The resulting ^j^y 3 P{A 3 ) is an estimate of the 
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expected value of the random variable f(x). But the P- measurable sets are 
mathematically abstruse, and they can place heavy demands on the understand- 
ing and intuition of anyone who is not well-versed in mathematical analysis. For 
instance, it can be difficult for a non-specialist to visualize a measurable set A 
in terms of laboratory, industrial or financial measurements of some real-world 
quantity. 

In contrast, the data classes JW of elementary statistics in Table[T]are easily 
understood as real intervals, of one or more dimensions; and these are the basis 
of the Riemann approach to random variation. 

To illustrate the Lebesgue-Kolmogorov approach, suppose X is a normally 
distributed random variable in a sample space Q. Then we can represent f2 
as R, the set of real numbers; with X represented as the identity mapping 
X :R — >R, X(x) — x; and with distribution function Fx defined on the family 
2^ of intervals I of R, Fx ■ 2k — * [0,1]: 



Then, in the Lebesgue-Kolmogorov approach, we generate, from the distribu- 
tion function Fx, a probability measure Px ■ As, — > [0,1] on the family As, 
of Lebesgue measurable subsets of f2 = R. So the expectation E p (f) of any 
Px-measurable function / of x is the Lebesgue integral J n f(x)dPx ■ With 
ri identified as R, this is just the Lebesgue-Stieltjes integral J Wi f(x)dFx, and, 
since x is just the standard normal variable of ((TJ, the latter integral reduces to 
the Riemann-Stieltjes integral - with Cauchy or improper extensions, since the 
domain of integration is the unbounded R =] — oo, oof. 

In presenting this outline we have skipped over many steps, the principal ones 
being the probability calculus and the construction of the probability measure 
P. It is precisely these steps which cease to be necessary preliminaries if we take 
a generalized Riemann approach, instead of the Lebesgue-Kolmogorov one, in 
the study of random variation. 

Because the generalized Riemann approach does not make use of an abstract 
measurable space il as the sample space, from here onwards we will take as given 
the identification of the sample space with R or some subset of R, or with a 
Cartesian product of such sets, and take the symbol CI as denoting such a space. 
Accordingly we will drop the traditional notations X and f(X) for denoting 
random variables. Instead a random variable will be denoted by the variable 
(though unpredictable) element x of the (now Cartesian) sample space, or by 
some function fix) of x. The associated likelihoods or probabilities will be given 
by a distribution function F{I) defined on intervals (which may be Cartesian 
products of one-dimensional intervals) of Vl. Whenever it is necessary to relate 
the distribution function F to its underlying random variable x, we may write 
F as F x . 
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3 A Generalized Riemann Approach: From Dis- 
tribution Functions Rather Than From Prob- 
ability Measures 

The standard approach starts with a probability measure P defined on a sigma- 
algebra of measurable sets in an abstract sample space f2; it then deduces prob- 
ability density functions F. These distribution functions (and not some ab- 
stract probability measure) are the practical starting point for the analysis of 
many actual random variables - normal (as described above in U}), exponential, 
Brownian, geometric Brownian, and so on, i.e. practical data analysis. 

In contrast, the generalized Riemann approach posits the probability distri- 
bution function F as the starting point of the theory, and proceeds along the 
lines of the simpler and more familiar (Table [l} instead of the more complicated 
and less intuitive (Table [5]). 

To formalize the concepts, a random variable (or observable) is now taken 
to be a function f(x) defined on a domain £1 = S B = \\{S : B} where S isE, or 
some subset of E, and B is an indexing set which may be finite or infinite; the 
elements of fl being denoted by x; along with a likelihood function F defined 
on the intervals of Y\{S ■ B}. 

In some basic examples such as throwing dice, S may be a set such as 
{1,2,3,4, 5, 6}, or, where there is repeated sampling, a Cartesian product of such 
sets. Alternatively, S will be the set of positive numbers R+. So quantitative 
and qualitative data encoding are easily supported. 

The Lebesgue-Kolmogorov approach develops probability density functions 
F from probability measures P(A) of measurable sets A. Even though dis- 
tribution functions are often the starting point in practice (as in ([T]) above), 
Kolmogorov gives primacy to the probability measures P, and they are the 
basis of the calculus of probabilities, including the crucial relation 



Viewed as an axiom, relation ([2]) is a somewhat mysterious statement about 
rather mysterious objects. But it is the lynch-pin of the Lebesgue-Kolmogorov 
theory, and without it the twentieth century understanding of random variation 
would have been impossible. 

The generalized Riemann approach starts with probability density functions 
F x defined only on intervals I of the sample space ft — S B . We can, as 
shown below (|12p , deduce from this approach probability functions P x defined 
on a broader class of "integrable" sets A, and a calculus of probabilities which 
includes the relation ^ — but as a theorem rather than an axiom. 

What, if any, is the relationship between these two approaches to random 
variation? There is a theorem (Muldowney and Skvortsov, 2001/2002) which 
states that every Lebesgue integrable function (in It 8 ) is also generalized Rie- 
mann integrable. In effect, this guarantees that every result in the Lebesgue- 
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Kolmogorov theory also holds in the generalized Riemann approach. So, in this 
sense, the former is a special case of the latter. 

The key point in developing a rigorous theory of random variation (which 
supports data valuation and hence data analysis) by means of generalized Rie- 
mann integration is, following the scheme of Table [TJ to partition the domain 
or sample space il = S B , in an appropriate way, as we shall proceed to show. 
(Whereas in the Lebesgue-Kolmogorov-Ito approach we step back from Table [TJ 
and instead use Table O supported by @ . The two approaches part company 
at the Tables [JJ and [3] stage.) 

In the generalized Riemann approach we focus on the classification of the 
sample data into mutually exclusive classes or intervals /. I.e., through data 
encoding we undertake partitioning of the sample space f2 = S B into mutually 
exclusive intervals /. 

In pursuing a rigorous theory of random variation along these lines this 
basic idea of partitioning the sample space is the key. Instead of retreating to 
the abstract (Kolmogorov measures on subsets) machinery of Table [H we find a 
different way ahead by carefully selecting the intervals P 3 ' which partition the 
sample space fl = R S . 

4 Riemann Sums 

An idea of what is involved in this can be obtained by recalling the role of 
Riemann sums in basic integration theory. Suppose for simplicity that the 
sample space fl is the interval [a, b[c R and the random variable f{x) is given 
by / : £1 — > R; and suppose F : X — > [0, 1] where X is the family of subintervals 
JCO=[o, b[. 

We can interpret F as the probability distribution function of the underlying 
random variable x, so F(I) is the likelihood that x € I. As a distribution 
function, F is finitely additive on X. 

The simplest intuition of likelihood - as something intermediate between cer- 
tainty of non-occurrence and certainty of occurrence - implies that likelihoods 
must be representable as numbers between and 1. It follows that distribu- 
tion functions are finitely additive on X. This immediately lifts the burden of 
credulity that ([2]) imposes on our naive or "natural" sense of what probability 
or likelihood is. 

With / a deterministic function of the random variable x, the random vari- 
ation of f{x) is our object of investigation. In the first instance we wish to 
establish E(f), the expected value of f(x), as, in some sense, the integral of / 
with respect to F, which is often estimated as in Table [TJ 

Following broadly the scheme of Table [TJ we first select an arbitrary number 

5 > 0. Then we choose a finite number of disjoint intervals I 1 , ...,I n ; P — 
[vP , u 3 '[, a — u° < u 1 < ■ ■ ■ < u n — b, with each interval P satisfying 

\P\ —v? -vP' 1 < 5. (3) 

We then select a representative x~ u 3 ' 1 < x 3 < u 3 , 1 < j < n. 
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(For simplicity we are using superscript ^ instead of ^) — for labelling, not 
exponentiation. The reason for not using subscript j is to keep such subscripts 
available to denote dimensions in multi-dimensional variables.) 

Then the Riemann (or Ricmann-Stieltjes) integral of / with respect to F 
exists, with f{x)dF — a, if, given any e > 0, there exists a number S > so 
that 



<s (4) 



for every such choice of x J , P satisfying ©, 1 < j < n. 

If we could succeed in creating a theory of random variation along these lines 
then we could reasonably declare that the expectation E F (f) of the random 
variable /(x), relative to the distribution function F(I), is f(x)dF whenever 
the latter exists in the sense of (HJ). (In fact this statement is true, but a 
justification of it takes us deep into the Kolmogorov theory of probability and 
random variation. A different justification is given in this article.) 

But ([3]) and (|4|) on their own do not yield an adequate theory of random 
variation. For one thing, it is well known that not every Lebesgue integrable 
function is Riemann integrable. So in this sense at least, Table [21 goes further 
than Table Q] and relation ([3J . 

More importantly, any theory of random variation must contain results such 
as Central Limit Theorems and Laws of Large Numbers, which are the core of 
our understanding of random variation, and the proofs of such results require 
theorems like the Dominated Convergence Theorem, which are available for 
Table [2] and Lebesgue integrals, but which are not available for the ordinary 
Riemann integrals of Table Q] and Q . 

However, before we take further steps towards the generalization of the Rie- 
mann integral ([4]) which will give us what we need, let us pause to give further 
consideration to data encoding. 

Though the classes P used in above are not required to be of equal 
length, it is certainly consistent with ^ to partition the sample data into equal 
classes. To see this, choose n so that [b — a)/n < S, and then choose each u J 
so that u J — u^ 1 = (b — a)/n. Then P = [tx J_1 ,u J [ (1 < j < n) gives us a 
partition of VL = [a, b[ in which each P has the same length (b — a)/n. 

We could also, in principle, obtain quantile classification of the data by 
this method of ^-partitioning. Suppose we want decile classification; that is, 
[a, b[= P U • • • U /" with F{P) = 0.1, 1 < j < n. This is possible, since the 
function F(u) :— F([a,u[) is monotone increasing and continuous for almost 
all u £]a,b[, and hence there exist v? such that F(u>} = j/10 for 1 < j < 10. 
So if 5 happens to be greater than maxfu 3 — : 1 < j < 10}, then the 

decile classification satisfies \P\ = tt J — u^ 1 < S for 1 < j < 10. (This 
argument merely establishes the existence of such a classification. Actually 
determining quantile points for a particular distribution function requires ad 
hoc consideration of the distribution function in question.) 
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In fact, this focus on the system of data encoding is the avenue to a rigorous 
theory of random variation within a Riemann framework, as we shall now see. 

5 The Generalized Riemann Integral 

In the previous section we took the sample space 51 to be [a, b[. As our attention 
from here on is going to be (below in the application study) increasingly focussed 
on counts or frequencies, which are non-negative, we will take the sample space 
to beR + =]0, oo[, or a multiple Cartesian product ofE, + by itself. 

Figure [T] shows a partition of an unbounded finite-dimensional domain such 
as R+ x R + . In this illustration, 
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For each elementary occurrence x G ft —WJ 1 (n a positive integer), let 8(x) 
be a positive number. Then an admissible classification of the sample space, 
called a 8 -fine division of f2, is a finite collection 

E s :={{x\P)}^ (6) 

so that x 3 is in P. The P are disjoint with union 17, and the lengths of the 
edges (or sides) of each P are bounded by 8{x 3 ). 

So, referring back to Table[T]of elementary statistics, what we are doing here 
is selecting the data classification intervals P along with a representative value 
x° from P . 

It is convenient (though not a requirement of the theory) that the represen- 
tative value x 3 should be a vertex of P , and that is how we shall proceed. 

In the case of the ordinary Riemann integral in a compact domain (cf. (dJ)), 
the positive function 8 is simply a positive constant, and the bound in question 
is simply the condition that each edge of each interval has length less than 8, Or- 
dinary Riemann integration over unbounded domains, or domains which contain 
singularity points of the integrand, is obtained by means of the improper Rie- 
mann integral (for details of which, see Rudin (1970) for instance). In contrast, 
the generalized Riemann integral handles all of these situations in essentially 
the same way, removing the need for improper extension. In the illustration in 
Figure [T] above, some of the edges are infinitely long. The precise sense in which 
each edge (finite or infinite) of I 3 is bounded by 8(x 3 ) is explained at the end 
of this section. 
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Figure 1: Unbounded two-dimensional domain with partition used for data 
encoding. 
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The Riemann sum corresponding to ([5]) is 



(£ 5 )£/(x)F(/):=£/(^>(/i) (7) 
i=i 

i.e. it is simply the sum over the terms in equation We say that / is 

generalized Riemann integrable with respect to F, with J n f(x)F(I) — a, if, for 
each e > 0, there exists a function 5 : f2 — >R,+ so that, for every £$, 

(£ s )^2f(x)F(r)-a\< e . (8) 

With this step we overcome the two previously mentioned objections to the use 
of Riemann-type integration in a theory of random variation. Firstly, every 
function / which is Lebesgue-Stieltjes integrable in Q with respect to F is also 
generalized Riemann integrable, in the sense of ([8]). See Gordon (1994) for a 
proof of this. Secondly, we have theorems such as the Dominated Convergence 
Theorem (see, for example, Gordon, 1994) which enable us to prove Laws of 
Large Numbers, Central Limit Theorems and other results which are needed for 
a theory of random variation. 

So we can legitimately use the usual language and notation of probability 
theory. Thus, the expectation of the random variable f(x) with respect to the 
probability distribution function F(I) is 

E F (f(x)) = / f(x)F(I). 
Jn 

To recapitulate, elementary statistics involves calculations of the form ((T|), often 
with classes / of equal size or equal likelihood. We refine this method by carefully 
selecting the data classification intervals /. In fact our Riemann sum estimates 
involve choosing a finite number of occurrences {xW, . . . , x^} from O (actually, 
from the closure of £1), and then selecting associated classes {1^, . . . ,1^}, 
disjoint with union fi, with x^' in 1^ (or with each x^ a vertex of in the 
version of the theory that we are presenting here), such that for each 1 < j < ri, 
is S-fine. The meaning of this is as follows. 

Let R + = R + U {0,oo} be E, + with the points and 00 adjoined. (In the 
following paragraph, x — and x — 00 are given special treatment. Many 
functions are undefined for x = 00; and x — is a singularity for the function 
In x which may be of use in data normalization - for instance when dealing with 
astronomy stellar magnitudes or financial ratios.) 

Let / be an interval in R + , of the form 

]0,v[, [u,v[, or [u,oc[, (9) 

and let 6 : R+ — - »]0, oo[ be a positive function defined for x G R+. The function 
S is called a gauge in R + . We say that / is attached to x (or associated with x) 
if 

x = 0, x = u or v, x = 00 (10) 
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respectively. If I is attached to x we say that (x, I) is 5- fine (or simply that I 
is 8- fine) if 

v < 5(x), v — u<5(x), u > — — (11) 

6{x) 

respectively. 

That is what we mean by 5-fineness in one dimension. What about higher 
dimensions? 

Suppose / = I\ x I 2 x ■ ■ • x /„ is an interval ofR" = R + xt + x ■ • R + , each 
Ij being a one-dimensional interval of form A point x = (x\, x%, ■ ■ ■ , x n ) of 
R™ is attached to / inR™ if each Xj is attached to Ij inR + , 1 < j < n. Given 
a function 5 : R™ — >]0, oo[, an associated pair (x, /) is <5-fine in R" if each Ij 
satisfies the relevant condition in (fTT|) with the new S(x). A finite collection 
of associated (x,I) is a 5-fine division of E" if the intervals / are disjoint with 
union R™, and if each of the (a;, J) is <5-fine. A proof of the existence of such a 
(5-fine division is given in Henstock (1988), Theorem 4.1. 

A glance at Diagram (fTJ) above will show that many of points x involved 
in a division of It™ (vertices of the partitioning intervals), which correspond to 
the representative occurrences x^' of the data encoding in Table [l] will belong 
to E, + \ E," ; in other words x may have some components Xj equal to or 
oo. The special arrangements we have made for such points, in (jlll) above, 
are in anticipation of the singularities that are present at such points in the 
expressions that arise in our data encoding problem. These arrangements, which 
are characteristic of generalized Riemann integration, forestall any need for the 
kind of improper extensions which are needed in other integration theories. 



6 But Where Is The Calculus of Probabilities? 

There are certain familiar landmarks in the study of probability theory and its 
offshoots such as the calculus of probabilities, which has not entered into the 
discussion thus far. The key point in this calculus is the relationship 

oo 

p(u- 1 ^ i ) = E p (^). 

i=i 

In fact the set- functions P and their calculus are not used as the basis of the gen- 
eralized Riemann approach to the study of random variation. Instead, the basis 
is the simpler set-functions F, defined only on intervals, and finitely additive on 
them. 

But, as mentioned earlier, an outcome of the generalized Riemann approach 
is that we can recover set-functions defined on sets (including the measurable 
sets of the Kolmogorov theory) which are more general than intervals, and we 
can recover the probability calculus which is associated with them. 

To see this, suppose A C S7 is such that J n 1a(x)F{I) exists in the sense of 
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©. Then define 

P F [A) = f l A (x)F(I), (12) 
Jn 

and we can easily deduce from the Dominated Convergence Theorem for gener- 
alized Riemann integrals, that for disjoint Aj for which Pp(Aj) exists, 

oo 

p F (u^) = E p ^(^)- 

3=1 

Other familiar properties of the calculus of probabilities are easily deduced from 

m- 

Since every Lebesgue integrable function is also generalized Riemann inte- 
grable (Gordon, 1994), every result obtained by Lebesgue integration is also 
valid for generalized Riemann integration. So in this sense, the generalized Rie- 
mann theory of random variation is an extension or generalization of the theory 
developed by Kolmogorov, Levy, Ito and others. 

However the kind of argument which is natural for Lebesgue integration 
is different from that which would naturally be used in generalized Riemann 
integration, so it is more productive in the latter case to develop the theory of 
random variation from first principles on Riemann lines. Some pointers to such 
a development are given in (Muldowney, 1999). 

Many of the standard distributions (normal, exponential and others) are 
mathematically elementary, and the expected or average values of random vari- 
ables, with respect to these distributions — whether computed by means of the 
generalized Riemann or Lebesgue methods — often reduce to Riemann or Riemann- 
Stieltjes integrals. Many aspects of these distributions can be discovered with 
ordinary Riemann integration. But it is their existence as generalized Riemann 
integrals, possessing properties such as the Dominated Convergence Theorem 
and Fubini's Theorem, that gives us access to a full-blown theory of random 
variation. 

7 Marginal Distributions and Statistical Inde- 
pendence 

When random variables {xt}t^B are being considered jointly, their marginal 
behavior is a primary consideration. This means examining the joint behavior 
of any finite subset of the variables, the remaining ones (whether finitely or 
infinitely many) being arbitrary or left out of consideration. Thus we are led to 
families 

{x t : t e N} NQ b 

where the sets N belong to the family T of finite subsets of B, the set B being 
itself finite or infinite. (When B is infinite the family (xt)t£B is often called 
a process or stochastic process, especially when the variable t represents time. 
We will write the random variable Xt as x(t) depending on the context; likewise 
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Xtj = x{tj) = Xj.) In the following discussion we will suppose, for illustrative 
purposes, that for each t the domain of values of x t is the set E.+ of positive 
numbers. This would apply if, for instance, (x t ) is price history, t £ B. 

The marginal behavior of a process is specified by marginal distribution 
functions. The marginal distribution function of the random variable or process 
x b = (x t )teB, for any finite subset N = {t\,t2, ■ ■ ■ ,t n } C B, is the function 

F( XuX2 ,..., Xn) {h x h x • • • x 7„) (13) 

defined on the intervals I\ x • • • x 7„ of K,™ , which we interpret as the likelihood 
that the random variable Xj takes a value in the one-dimensional interval Ij 
for each j, 1 < j < n; with the remaining random variables x t arbitrary for 
t e B\N. 

One of the uses to which the marginal behavior is put is to determine the 
presence or absence of independence. The family of random variables (xt)tes is 
independent if the marginal distribution functions satisfy 

)(7i x 7 2 x • • • x I n ) = F Xl (h) x F X2 (7 2 ) x • • • F Xn (/„) 

for every finite subset N — {ti, . . . , t n } C B. That is, the likelihood that the 
random variables x tll Xt 2 , ■ ■ ., x tn jointly take values in 7i, I2 • • •, 7„ (with Xt 
arbitrary for t E B \ N) is the product over j = 1, 2, . . . , n of the likelihoods of 
x tj belonging to Ij (with x t arbitrary for t ^ tj, j = 1, 2, . . . , n) for every choice 
of such intervals, and for every choice of finite subset N of B. 

Of course, if B is itself finite, it is sufficient to consider only N = B in order 
to establish whether or not the random variables are independent. 

8 Cylindrical Intervals to Support Infinite Di- 
mensional Spaces 

When B is infinite (so the random variable x = [x(t))teB is a stochastic process), 
it is usual to define the distribution of x as the family of distribution functions 

{7 1 (a; (t 1 ), 2; (t 2 ),..., a; ( t „))(7i x 7 2 x • • • x 7„) : {h,t 2 , ...,t n }cB} (14) 

This is somewhat awkward, since up to this point the distribution of a ran- 
dom variable has been given as a single function defined on intervals of the 
sample space, and not as a family of functions. However we can tidy up this 
awkwardness as follows. 

Firstly, the sample space n is now the Cartesian product ns-^+ Let 
T denote the family of finite subsets N = {ti,t2, • ■ • , t n } of B. Then for any 
NeT, the set 

I[N] := 7 tl x 7 t2 x • • • x I tn x : B \ N} 

is called a cylindrical interval. Taking all choices of TV e T and all choices of 
one-dimensional intervals Ij (tj e N), denote the resulting class of cylindrical 
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intervals by X. These cylindrical intervals are the subsets of the sample space 
that we need to define the distribution function F of x in : 

F(I[N}) := F (x(tl)ia(ta)j ... )X(tn)) (/ tl x I t3 x • • • x 7 t J (15) 

for every N ^ T and every I[N] G X. 

By thus defining the distribution function F (of the underlying random vari- 
able x € ) on the family of subsets X (the cylindrical intervals) of K/^, we 
are in conformity with the system used for describing distribution functions in 
finite-dimensional sample spaces. 

As in the elementary situation of Table [TJ it naturally follows, if we want 
to estimate the expected value of some deterministic function of the random 
variable (or process) (x(t)) te B, that the joint sample space £1 = R+ of the 
individual random variables x{t) should be partitioned by means of cylindrical 
intervals I[N]. 

To demonstrate such a partition, we suppose B is the time interval ]r, T], so 
the sample space SI isK/^ = Y[ te i r T ]R+ = R+' T '. Suppose 

t = t Q <h < t 2 < ■ ■ ■ < t n =T, 

and, with N denoting {t%, t%, ■ ■ ■ , t n }, suppose 

I[N] = h X h X ■ ■ ■ X In X 1-+ V7V 

is one of the cylindrical intervals forming a partition of K/^. 

In Figure [21 we can show only three dimensions. As in Figure [TJ the fact 
that the sample space is unbounded in each of its separate dimensions means 
that many of the partitioning intervals have associated points with one or more 
components equal to or oo. We have terms lnxj in the integrand which are 
undefined for Xj = 0, just as lnoo is undefined. In generalized Riemann inte- 
gration, any intervals involving a singularity must have the point of singularity 
as the attached or associated point. By arranging things in this way, gener- 
alized Riemann integration avoids having to resort to the improper or Cauchy 
extensions when the integrand involves a point of singularity. 

In contrast to Figure [1] the partitioning intervals may have different re- 
stricted dimensions. For instance, in Figure El the cylindrical interval I 11 is 
restricted only in the vertical direction ti\ and is unrestricted in the horizontal 
direction t\ and in each of the infinitely many other directions t G B\{t\, £2} (of 
which only one of the directions perpendicular to both t± and t<i is shown in the 
diagram). This is a particular feature of partitioning infinite-dimensional do- 
mains by means of infinite-dimensional cylindrical intervals, which we must take 
account of when we construct Riemann sums of integrands over such partitions. 

In this illustration (Figure [2j) the cylindrical intervals mostly correspond to 
the finite-dimensional intervals of ([5]), but an extra one, I , has been included 
to demonstrate that the restricted dimensions of the cylindrical intervals do not 
all have to be the same in a partition of an infinite-dimensional space. (Of course 
this is also true for finite dimensional spaces. We could have included an interval 
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corresponding to I 11 in ([5]), but in partitioning for Riemann sum estimates in 
the finite-dimensional case, these kind of intervals can be avoided and nothing is 
gained by admitting them. But in partitioning inhnite-dimensional spaces they 
cannot be avoided.) 

The intervals in Figure [5] are: 

1 1 = [u\,ul[x[uiu 3 2 [xll{BL + :teB\{t 1 ,t 2 }}, 

1 2 = [ulut[x[ulu 5 2 [xU{^+-teB\{t 1 ,t 2 }}, 

1 3 = [ulul[x[u 1 2 ,u 3 2 [xU{^+-teB\{t 1 ,t 2 }}, 

1 4 = [u?,oo[x]0,i4[xn{R+:tG-B\{ti,* 2 }}, 

1 5 = [ul,oo[x[ul,ul[x{flM. + :t£B\{t u t 2 }}, 

1 6 = [uf,oo[x[ui,ca[x{l[R + :teB\{ti,t 2 }}, (16) 

1 7 = [ului[x[uloo[x{Y\R + :teB\{t 1 ,t 2 }}, 
I s = ]0,ul[x[uioo[x{l\R + :teB\{t 1 ,t 2 }}, 
I 9 = }0,u\[x[ulu 3 2 [x{Y\^+:teB\{t 1 ,t 2 }}, 

1 10 = }0,u 3 1 [x}0,u 2 2 [x{U^+:teB\{t 1 ,h}}, 

1 11 = }uiujlxl\{R + :te B,t^t 2 }. 

Criteria {8]), (fT7|) place no a priori conditions on the functions / and F in the 
integrand when we test it for integrability. There are no required or preferred 
kinds of function. It is true that we have required F to be finitely additive, but 
this is related to our secondary purpose of constructing an alternative to the 
Kolmogorov theory of probability and random variation. Of course, in meeting 
the criteria (J8]), (fT7|) . any good properties possessed by / and F may come 
into play in order to give us a good encoding. The foregoing remarks may be 
translated into language that is more appropriate for statistical data analysis: 
there is no necessary a priori morphology for the data cloud to be analyzed; or 
there is no necessary a priori model or distribution for the data. 



9 A Theory of Joint Variation of Infinitely Many 
Random Variables 

As discussed earlier, the Riemann sum approach can be adapted so that it yields 
a theory of random variation which meets the theoretical and practical needs of 
analysis. 

The adaptation that is needed when only a finite number of random variables 
is involved has been explained already. 

But how can it be adapted to the situation when there are infinitely many 
random variables to be considered jointly? What kind of Riemann sums are 
appropriate in a rigorous theory of joint variation of infinitely many variables? 

In other words, what kind of partitions are permitted in forming the Riemann 
sum approximation to the expected value of a random variable which depends 
on infinitely many underlying random variables? 

In ordinary Riemann integration we form Riemann sums by choosing par- 
titions whose finite-dimensional intervals have edges (sides) which are bounded 
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Figure 2: As for Figure [1] unbounded two dimensional domain with partition 
used for data encoding, illustrating the use of different restricted dimensions. 
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by a positive constant 8. Then we make 8 successively smaller. Likewise for 
generalized Riemann integration, where the constant 8 is replaced by a positive 
function 8(x). In any case, we are choosing successive partitions in which the 
component intervals successively "shrink" in some sense. 

For the infinite-dimensional situation, we seek likewise to "shrink" the cylin- 
drical intervals I[N] of which successive partitions are composed. In Figure [3] 
we show different ways in which a cylindrical interval can be a subset of a larger 
cylindrical interval, and hence seek to establish effective rules by which intervals 
of successive partitions can be made successively smaller. 

Let the horizontal direction in Figure [3] be denoted ti, denote the vertical 
direction by t 2 , and denote the direction perpendicular to both by £3. Let 
B denote the set of all the dimensions, or mutually perpendicular directions, 
of the domain ftf . Then I 1 is [tt^,uf[x ■ te B, t 2 }. The interval 

I 2 = [u2,m|[x n{"^+ ■ t E B, t ^ £2} is a subinterval of I 1 , in which the side 
corresponding to restricted dimension t 2 is shorter than the corresponding side 
of I 1 . This kind of "shrinking" is familiar from finite-dimensional Riemann 
integration. We get it by imposing a condition that the sides of the intervals be 
less than some positive function 8, and then taking 8 successively smaller. 

Now consider I 3 = [u{, u\[x [u 2 , u 2 [x n{^+ ■ t e B \ {ti,t 2 }}, which is a 
subset of I 2 , in which the length of the restricted sides is the same as the length of 
the restricted side of I 2 ; but in which there is an additional restricted dimension 
t\. Here we obtain shrinking, without changing 8, but by requiring the interval 
to have additional restricted dimensions. We can do this by specifying some 
minimal finite set of dimensions in which the interval must be restricted. (We 
may allow the interval to be restricted in additional dimensions outside of this 
minimal set; just as the sides can be as small as we like provided their length 
is bounded by 5.) Then we can obtain shrinking of the intervals by increasing 
without limit the number of elements in this minimal finite set, just as we can 
obtain shrinking by decreasing towards zero the size of the 8 which bounds the 
lengths of the restricted sides. 

If we compare I 4 with I 2 we see both factors at work simultaneously - 
increased restricted dimensions and reduced length of sides. 

This provides us with the intuition we need to construct appropriate rules 
for forming partitions for Riemann sums in infinite-dimensional spaces. 

As before, suppose B is a set with a possibly infinite number of elements. 
Let T denote the family of finite subsets N of B. Let a typical N 6 T be 
denoted {tx, t 2 , . . . , t n }. Suppose the sample space is n = R+. For N e T, let 
R^ denote the projection of fl into the finite set N. Suppose Ij is an interval 

of type © in R^ } . Then h x I 2 X • • • X I n X R + is a cylindrical interval, 
denoted I[N], As before, let I denote the class of cylindrical intervals obtained 
through all choices of N € J 7 , and all choices of intervals Ij of type ([9]), for each 

tj e N, A point x G R^ is associated with a cylindrical interval I[N] if, for each 
tj e N, the component xj — x(tj) is associated with Ij in the sense of (fTU|) . A 
finite collection £ of associated pairs (x,I[N]) is a division of R^ if the finite 
number of the cylindrical intervals I[N] form a partition of R2; that is, if they 



18 



u% 




Figure 3: Illustration of different ways in which a cylandrical interval can be 
a subset of a larger cylandrical interval; and hence how data encoding level 
resolution is supported. 
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are disjoint with union K, + . 

Now define functions Sn and L as follows. Let L : i— ► J 7 , and for each 

N 

N G J 7 let <5 at : E, + i— >]0, oof. The mapping L is defined on the set of associated 
points of the cylindrical intervals I[N] G X; and, for each N G J 7 , the mapping 
Sn is a function defined on the set of associated points of intervals I\ X • • • X I n 
m BL , . 

The sets L(a;) and the numbers 5jv(xi, . . . , x n ) determine the kinds of cylin- 
drical intervals, partitioning the sample space, which we permit in forming Rie- 
mann sums. 

A set L{x) G T determines a minimal set of restricted dimensions which 
must be possessed by any cylindrical interval I[N] associated with x. In other 
words, we require that N D L(x). The numbers 5n(x%, . . . , x n ) form the bounds 
on the lengths of the restricted faces of the cylindrical intervals I[N] associated 
with x. Formally, the role of L and Sn is as follows. 

For any choice of L and any choice of the family {6n}n£Ti let 7 denote 
(L, {Sn}n£f)- We call 7 a gauge inIL+. The class of all gauges is obtained by 
varying the choices of the mappings L and Sn- 

Given a gauge 7, an associated pair (x,I[N]) is j-fine provided N D L(x), 
and provided, for each tj G N, (xj,Ij) is (5/v-fine, satisfying the relevant condi- 
tion in pip with Sn (x% , x% , . . . x n ) in place of 5(x) . 

Given a random variable, or function / of x, with a probability distribu- 
tion function F defined on the cylindrical intervals I[N] of X, the integrand 
f(x)F(I[N]) is integrable in , with / ffiB f(x)F(I[N}) = a, if, given e > 0, 

there exists a gauge 7 so that, for every 7-fine division £ 7 of K/^, the corre- 
sponding Riemann sum satisfies 



If B is finite, this definition reduces to definition JHJ, because, as each L(x) 
increases, in this case it is not "without limit"; as eventually L(x) = B for all x, 
and then (fTT|) is equivalent to ©. Also (JT7J) yields results such as Fubini's The- 
orem and the Dominated Convergence Theorem (see Muldowney, 1988) which 
are needed for the theory of joint variation of infinitely many random variables. 

10 Application to Financial Data Analysis 

In a number of papers, Muldowney (2000/2001, 2002, 2005) has explored ex- 
pectation and, more generally, integral properties of the Black-Scholes model 
of derivative asset pricing. In the application studied in this article, we will 
consider the finding of structure in empirical financial data. For this we will use 
correspondence analysis, because it provides an integrated tool set for assessing 
departure from standard behavior in the data. 

Correspondence analysis is a data analysis approach based on low-dimensional 
spatial projection. Unlike other such approaches, it particularly well caters for 




(17) 



20 



qualitative or categorical input data, including counts. Hence it is an ideal ex- 
ample of our view that generalized Ricmann integration offers a solid theoretical 
framework on which to base such an analysis. 

Our objectives in this analysis are to take data recoding as proposed in 
Ross (2003) and study it as a type of coding commonly used in correspondence 
analysis. Ross (2003) uses input data recoding to find faint patterns in otherwise 
apparently structureless data. The implications of doing this are important: we 
wish to know if such data recoding can be applied in general to apparently 
structureless financial or other data streams. 

More particularly our objectives are to assess the following: 

1. Using categorical or qualitative coding may allow structure, impercepti- 
ble with quantitative data, to be discovered. Quantile-based categorical 
coding (i.e., the uniform prior case) has beneficial properties, as will be 
demonstrated. But the issue of appropriate coding granularity, or scale of 
problem representation, remains, and we will address this issue below. 

2. In the case of a time- varying data signal (which also holds for spatial 
data, mutatis mutandis) non-respect of stationarity should be checked for: 
the consistency of our results will inform us about stationarity present in 
our data. More generally, structures (or models or associations or rela- 
tionships) found in our data are validated through consistency of results 
obtained using subsets of the population studied. 

3. Departure from average behavior is made easy in the analysis framework 
adopted. This amounts to fingerprinting the data, i.e. determining pat- 
terns in the data that are characteristic of it. 

11 Searching for Structure in Price Processes 
11.1 Data Transformation and Coding 

Using crude oil data, Ross (2003) shows how structure can be found in appar- 
ently geometric Brownian motion, through data recoding. Considering monthly 
oil price values, P(i) 1 and then L(i) = log(P(z)), and finally D(i) = L(i) — L(i — 
1), a histogram of D(i) for all i should approximate a Gaussian. The following 
recoding, though, gives rise to a somewhat different picture: response categories 
or states 1, 2, 3, 4 are used for values of D(i) less than or equal to —0.01, be- 
tween the latter and 0, from to 0.01, and greater than the latter. Then a 
cross-tabulation of states 1 through 4 for yt+i, against states 1 through 4 for yt, 
is determined. The cross-tabulation can be expressed as a percentage. Under 
geometric Brownian motion, one would expect constant percentages. This is 
not what is found. Instead there is appreciable structure in the contingency 
table. 

Ross (2003) pursues exploration of a geometric Brownian motion justification 
for Black-Scholes option cost. States-based pricing leads to greater precision 
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compared to a one-state alternative. The number of states is left open with 
both a 4-state and a 6-state analysis discussed (Ross, 2003, chap. 12). A % 2 
test of independence of the contingency table from a product of marginals is 
used with degrees of freedom associated with contingency table row and column 
dimensions: this provides a measure of how much structure we have, but not 
between alternative contingency tables. The latter is very fittingly addressed 
with the x 2 metric (see Murtagh, 2005) used in correspondence analysis: we can 
say that correspondence analysis is the transformation of pairwise \ 2 distances 
into Euclidean distances, and that the latter greatly facilitates visualization 
(e.g., low-dimensional projection) and interpretation. The total inertia or trace 
of the data table grows with contingency table dimensionality, so that is of no 
direct help to us. For the futures data used below, and contingency tables of 
size 3 x 3, 4 x 4, 5 x 5, 6 x 6, and 10 x 10, we find traces of value: 0.0118, 
0.0268, 0.0275, 0.0493, and 0.0681, respectively. Barring the presence of low- 
dimensional patterns arising in such a sequence of contingency tables, we will 
always find that greater dimensionality implies greater complexity (quantified, 
e.g., by trace) and therefore structure. 

To address the issue of number of coding states to use, in order to search 
for latent structure in such data, one approach that seems very reasonable is to 
explore the dependencies and associations based on fine-grained structure; and 
include in this exploration the possible aggregation of the fine-grained states. 
(Aggregation of states in correspondence analysis is catered for through the 
property of distributional equivalence: see Murtagh, 2005, for discussion.) 

11.2 Granularity of Coding 

We take sets of 2500 values from the time series. Tables |3] shows data to be 
analyzed derived from time series values 1 to 2500 (identifier i). Further, we 
use similar cross-tabulations for values 3001 to 5500 (identifier k), 2001 to 4500 
(identifier m), and values 3600 to 6100 (identifier n). 

Figure |4] shows the projections of the profiles in the plane of factors 1 and 2, 
using all four data tables - one of which is shown in Table [3l The result is very 
consistent: cf. how {il, kl, ml, ni} are tightly grouped, as are {i2, fc2, m2, n2}, 
reasonably so {ilO, klO, mlO, nlO}, and so on. The full space of all factors has 
to be used to verify the clustering seen in this planar (least squares optimal) 
projection. 

An analysis of clusters found is listed in Table [4] (Contributions to, and 
correlations with, the principal factors are used: see Murtagh, 2005, for a dis- 
cussion of where these may differ from projections onto the factors. Projections, 
e.g. as shown in Figure [4j are descriptive: "what is?", but correlations and con- 
tributions point to influence: "what causes?" . Correlations and contributions 
are used therefore, in preference to projections.) 

In cluster 65, coding category 9 is predominant. In cluster 68, coding cate- 
gories 2 and 3 are predominant. Cluster 69 is mixed. Cluster 70 is dominated 
by coding category 10. In cluster 71, coding category 8 is predominant. Cluster 
72 is defined by coding category 1. Finally, cluster 73 is dominated by coding 
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Factor 1 : 39.36% of inertia 

Figure 4: Factors 1 and 2 with input code categories 1 through 10 defined on 
4 different spanning segments of the input data signal. Only input, or current, 
values are displayed here. The 4 time series sub-intervals are represented by (in 
sequential order) i, m, k, n. The quantile coding is carried out independently 
in each set of 10 categories. 
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Table 3: Cross-tabulation of log-differenced futures data using quantile coding 
with 10 current and next step price movements. Values 1 to 2500 in the time 
series are used. Cross-tabulation results are expressed as percentage (by row). 
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category 5. 

From the clustering, we provisionally retain coding categories 1; 2 and 3 
together; 5; 8; 9; and 10. We flag response categories 4, 6, and 7 as being 
unclear and best avoided when our aim is prediction of the futures data. 

To check the coding relative to stationarity, we check that the global code 
boundaries are close to the time series sub-interval code boundaries. (See 
Murtagh, 2005, for more discussion on this, including confirmation of station- 
arity.) In broad terms, what we are checking here is the consistency of the 
representative elements, found in different subsets of the data, as illustrated 
above, right at the start of our presentation in this article, in Table [TJ 

12 Fingerprinting the Price Movements 

Typical movements can be read off in percentage terms in a table such as Table 
[3l More atypical movements serve to define the strong patterns in our data. 

We consider the clusters of current time-step code categories numbered 65, 
68, 69, 70, 71, 72, 73 from Table[4] and we ask what are the likely movements, for 
one time step. Alternatively expressed the current code categories are defined 
at time step i, and the one-step-ahead code categories are defined at time step 
t+1. 

We find the following predominant movements in Table 2] (using a thresh- 
olded contribution value - not shown here; we recall that "contribution" is used 
in the correspondence analysis sense, meaning mass times projection squared): 

Cluster 65, i.e. code category 9: — > weakly 8 and more weakly 9. 

Cluster 68, i.e. code categories 2 and 3: — > 7. 

Cluster 69, i.e. mixed code categories: — > 6. 

Cluster 70, i.e. code category 10: — > 10. 
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Table 4: Table crossing clusters (on I) and coordinates (J), giving correlations 
and contributions (as thousandths). Clusters are labeled: 65, 68, 69, 70, 71, 72, 
73. 
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Cluster 71, i.e. code category 8 
Cluster 72, i.e. code category 1 
Cluster 73, i.e. code category 5 

Consider the situation of using these results in an operational setting. From 
informative structure, we have found that code category 1 (values less than the 
10th percentile, i.e. very low) has a tendency, departing from typical tendencies, 
to be prior to code category 1 (again very low). From any or all of tables such 
as Table [3] we can see how often we are likely to have this situation in practice: 
19.04% (= average of 23.29% from Tabled and 17.67%, 16.4%, and 18.8%, from 
the other analogous tables not shown here), given that we have code category 
1. 

Applying a similar fingerprinting analysis to Ross's (2003) oil data, 749 val- 
ues, we found that clustering the initial code categories did not make much 
sense: we retained therefore the trivial partition with all 10 code categories. 
For the output or one-step-ahead future code categories, we agglomerated 6 
and 7, and denoted this cluster as 11. We find the following, generally weak, 
associations derived from the contributions. 

Input code category 6 — ► output code categories 1, 10 (weak). 

Input code category 3 — ► output code category 2. 

Input code category 4 — > output code category 4. 

Input code categories 9, 2 — ► output code category 5 (weak). 

Input code category 10 — ► output code category 8. 

Not surprisingly, we find very different patterns in the two data sets of 
different natures used, the futures and the oil price signals. 

We have shown that structure can be discovered in data where such structure 
is not otherwise apparent. Furthermore we have used correspondence analysis, 
availing of its spatial projection and clustering aspects, as a convenient analysis 
environment. Validating the conclusions drawn is always most important, and 
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this is facilitated by (i) semi-interactive data analysis, and (ii) consistency of 
results across subsets of the domain under investigation, ft. 

13 Conclusions 

Our new framework for data, and the handling of data (including our defining 
of a normed vector space) , could be considered in a sense as "only" formalizing 
standard data analysis practice. But in the exploration and analysis of complex 
phenomena (cf. the search for local structure and patterns in price movements) 
we need to be sure of our belief in how our data express the underlying phe- 
nomena. The traditional Kolmogorov approach based on Lebesgue integration 
and sigma algebras of probability-measurable sets is unnecessarily abstract and 
therefore largely ignored by the "engineering" or pragmatic common sense of 
the data analyst. 

In this article we have shown how the generalized Riemann integral lends 
itself to a more transparent definition of probability, in line with empirical data 
analysis practice. As a foundation for our data analysis tasks, it achieves a far 
better cohesiveness between data, and data analyses, vis a vis the underlying 
phenomena. 
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